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Executive  Summary 


In  order  to  avoid  detection  by  hostile  radar  systems,  it  is  essential  for  modem  military 
platforms  to  have  the  capability  to  detect  deterministic  signals  buried  in  white  noise.  A 
wide  variety  of  detection  techniques  have  been  developed  in  answer  to  this  problem;  their 
efficiency  depends  on  the  degree  of  knowledge  of  the  incoming  signal.  This  report  is 
applicable  to  the  case  where  nothing  is  known  a  priori  about  the  parameters  of  the 
incoming  signal,  and  where  the  background  noise  is  Gaussian. 

Two  approaches  are  compared.  The  first  is  a  “normality”  detector,  which 
constructs  a  histogram  of  the  incoming  signal  data  points,  and  then  compares  this  to  the 
assumed  Gaussian  distribution  of  the  background  noise  by  means  of  a  chi-squared  test  to 
determine  the  presence  or  absence  of  deterministic  signal.  The  second  is  called  the 
recurrence  plot  detector  since  it  employs  the  time-delay  vectors  used  to  construct  the 
recurrence  matrix  -  a  concept  that  has  shown  great  utility  for  analyzing  dynamical 
systems.  This  detector  constructs  a  histogram  from  the  distances  between  the  time-delay 
vectors,  and  then  uses  a  chi-squared  test  to  compare  it  to  the  Gamma-like  distribution  that 
results  when  only  Gaussian  noise  is  present.  The  recurrence  plot  detector  has  the 
advantages  of  a  larger  number  of  data  points  as  well  as  the  explicit  incorporation  of  time 
information  in  the  incoming  signal.  Unfortunately,  there  may  exist  redundancies  among 
the  larger  number  of  recurrence  values  that  weaken  the  perfonnance  of  this  approach. 
However,  by  judicious  choice  of  the  embedding  parameters,  the  deleterious  effect  of 
redundancies  are  shown  to  be  minimized. 

Comparison  of  the  perfonnance  of  both  detectors  are  given  for  two  types  of 
incoming  signals  in  the  form  of  Receiver  Operating  Characteristic  (ROC)  curves.  In  each 
case,  there  exists  a  range  of  signal-to-noise  ratios  for  which  the  recurrence  plot  detector 
offers  superior  performance  over  the  nonnality  detector,  thereby  demonstrating  that  the 
recurrence  matrix  concept  has  utility  for  signal  detection.  Appendices  provide 
derivations  of  the  relevant  probability  density  functions  as  well  as  a  procedure  for 
constructing  “ideal”  Gaussian  background  data  sets. 


M  anuscript  approved  A  ugust  21,  2007. 
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Introduction 

Detection  of  deterministic  signals  buried  in  white  noise  is  an  essential  tactical 
requirement  for  present-day  military  platforms  in  order  to  avoid  detection  by  hostile  radar 
systems.  However,  current  low-probability  of  intercept  (LPI)  radar  systems  spread  the 
energy  they  transmit  over  a  large  spectral  range,  reducing  the  power  in  the  signal  to  a 
level  below  the  thermal  noise  in  the  target  platform’s  radar-warning  receiver. 

A  wide  array  of  detection  techniques  exist  for  finding  a  signal  in  background 
noise.  In  each  the  general  approach  is  the  same:  compute  a  test  statistic  from  the  data  and 
compare  it  to  a  detection  threshold.  Which  statistic  to  compute  and  where  to  set  the 
threshold  depend  on  the  amount  of  a  priori  knowledge  one  has  about  the  signal  and  on 
the  costs  assigned  to  a  missed  detection.  For  example,  when  detecting  known  signals  in 
additive  Gaussian  noise  an  optimum  receiver  (test  statistic)  has  been  developed  based  on 
the  likelihood  ratio,  yielding  the  Matched  Filter.  The  detection  threshold  can  be  set  using 
any  one  of  a  number  of  criteria  for  optimality  including:  maximum  a  posteriori  (MAP), 
Bayes’  criterion,  and  Neyman-Pearson  criterion  to  name  a  few.  An  extensive  literature 
has  also  evolved  to  handle  cases  where  the  parameters  of  the  signal  are  only  partially 
known.  A  thorough  review  is  given  in  McDonough  and  Whalen1 .  The  case  of  interest  in 
this  paper  is  that  in  which  nothing  is  known  a  priori  about  the  parameters  of  the  incoming 
signal.  This  is  the  most  difficult  detection  scenario,  and  some  form  of  normality  detector 
is  really  the  only  option. 

This  paper  explores  the  utility  of  recurrence  plots  for  detecting  signals  hidden  in 
noise,  when  the  signal  is  completely  unknown  and  the  background  noise  is  Gaussian. 

The  recurrence  plot  was  originally  constructed  as  a  graphical  tool  to  check  stationarity  in 
dynamical  systems".  It  essentially  quantifies  correlations  in  a  time  series  by  keeping 
track  of  when  the  signal  “recurs”;  that  is,  returns  to  a  previous  state.  It  has  proven  to  be 
very  useful  in  both  depicting  and  analyzing  complex,  nonlinear,  especially  chaotic, 
systems.  Examples  may  be  found  in  fields  as  diverse  as  economy  and  earth  science, 
astrophysics  and  physiology3.  Applications  of  recurrence  plot  techniques  to  signal 
detection  have  also  appeared  in  the  literature4,5,6.  Zbilut  et  al’  used  a  detector  based  on 
recurrence  plots  to  distinguish  detenninistic  signals  from  noise.  A  detector  based  on 
cross-recurrence  plots  was  used  to  extract  signals  from  noise  and  was  compared  to  a 
spectral  detection  scheme5.  Recurrence  plots  were  also  used  to  distinguish  signal  from 
noise  in  physiologically  generated  signals  (EMG  signal)  in  Filligoi  .7 

One  promising  approach  to  signal  detection  based  on  recurrence  plots  focuses  on 
the  structure  of  the  recurrence  plot  itself8.  Essentially  this  approach  starts  with  the 
recognition  that  the  points  in  the  recurrence  plot  for  pure  Gaussian  noise  are  unifonnly 
distributed  (except  for  the  diagonal)  so  that  the  existence  of  structured  dark  and  light 
patterns  must  be  due  to  a  signal.  This  analysis  develops  various  metrics  to  quantify  these 
patterns  and  tests  their  performance  in  detecting  signals. 

This  paper  will  present  a  different  approach,  based  on  the  statistical  properties  of 
the  points  in  the  unthresholded  recurrence  plot.  For  a  signal-in-noise  data  stream 
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consisting  of  M  independent  measurements  the  NxN  unthresholded  recurrence  matrix  is 
formed.  This  results  in  N(N-l)/2  data  points  as  the  recurrence  matrix  is  symmetric  about 
the  diagonal.  One  potential  advantage  of  recurrence-based  detectors  is  therefore  that  the 
number  of  data  points  available  for  processing  is  increased  from  M  to  N(N-l)/2  points, 
where  in  practice  N  is  nearly  as  large  as  M.  Additionally,  the  time  delay  parameter  used 
in  constructing  the  recurrence  plot  explicitly  incorporates  the  time  information  in  the 
incoming  signal.  This  information  is  not  included  in  a  nonnality  detector.  But,  there  is 
the  disadvantage  that  there  may  be  redundancies  among  the  N(N-l)/2  data  points  now 
under  consideration.  If  these  redundancies  are  excessive,  the  additional  data  points  will 
actually  be  detrimental  rather  than  advantageous.  Judicious  choices  for  the  embedding 
parameters  of  the  recurrence  matrix  will  minimize  this  problem. 

In  this  paper,  both  the  normality  detector  and  recurrence-based  detector  are 
compared  using  a  chi-squared  test.  For  the  nonnality  detector,  the  M  data  points  are  used 
to  construct  a  histogram,  which  is  compared  to  the  Gaussian  distribution  of  the 
background  noise  in  order  to  test  the  hypothesis  of  whether  a  signal  is  absent  or  present. 
The  other  detector,  refened  to  as  the  recurrence  plot  detector,  instead  uses  the  N(N-l)/2 
distances  between  data  points  in  the  recurrence  matrix  to  construct  a  histogram  which  is 
compared  to  a  Gamma-like  distribution  (derived  in  the  text)  of  distances  between  the 
Gaussian-distributed  noise  vectors.  The  expected  and  observed  distributions  for  both 
detectors  are  compared  using  a  chi-squared  test  in  order  to  determine  the  absence  or 
presence  of  a  signal. 

Even  though  it  has  been  shown9  that  the  expectation  of  unthresholded  recurrence 
plots  can  often  be  computed  simply  from  second-order  statistical  quantities,  this  paper 
will  directly  address  the  question  of  whether  the  advantages  (an  increase  in  available  data 
from  M  to  N(N-l)/2  points  and  the  inclusion  of  temporal  correlations)  are  sufficient  to 
evince  low  power  signals  for  which  the  recurrence  plot  detector  provides  a  perfonnance 
gain  over  the  nonnality  detector. 

The  manuscript  is  organized  as  follows.  First,  there  will  be  a  brief  explanation  of 
recurrence  plots,  which  introduces  the  mathematical  notation  for  this  paper.  Next,  the 
probability  distributions  relevant  to  this  analysis  will  be  derived.  As  the  two  detection 
approaches  are  fonnulated  in  detail,  the  importance  of  the  precision  required  for  the 
Gaussian  distribution  representing  the  background  noise  will  be  discussed,  as  well  as 
binning  techniques  for  the  chi-squared  test  employed.  Results  are  presented  in  the  form 
of  Receiver  Operating  Characteristic  (ROC)  curves  for  various  signal  types  and 
parameter  values  for  which  the  recurrence  plot  detector  provides  a  detection  performance 
gain  are  identified.  Extensive  calculations  are  required  since  the  analysis  is  sensitive  to 
several  significant  parameters;  viz,  sampling  rate,  embedding  dimension,  delay  time,  and 
a  chi-squared  statistics  adjustment  factor  (to  be  defined  in  the  text). 
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Recurrence  Plots 


Recurrence  plots  were  originally  intended  as  a  tool  for  analyzing  the  output  of  nonlinear, 
dynamical  systems  and  have  been  used  to  estimate  a  host  of  quantities  pertaining  to  the 
dimensionality  and  stability  of  the  system  being  observed3.  The  goal  in  this  work, 
however,  is  to  use  the  recurrence  plot  as  a  tool  for  exploring  correlations  in  time  series 
data  and  not  to  draw  inference  about  the  system  that  produced  the  signal. 


Consider  the  vector 

x  =  |x(l),  x(2), x{m), . . .  x(M)} 


consisting  of  M  real  data  values  sampled  at  discrete  times  t  =  in  At.  From  this  single 
vector,  a  new  family  of  N  time-delay  vectors  each  of  length  n<M  is  constructed  by  means 
of  a  time  delay  embedding 

Xt  =  {x(i), x(i  +  L),..., x{i  +  (n -  1)Z)}  i  =  1, 2, . . . , N 


where  n  is  referred  to  as  the  embedding  dimension  and  L  is  a  measure  of  time  delay. 
Clearly  M=N+(n-l)L.  Hence  the  M  one-dimensional  data  points  have  given  rise  to  N 
vectors  of  dimension  n.  Prescriptions  for  selecting  L,n  can  be  found  in10,  however,  these 
approaches  were  designed  for  use  with  the  output  of  a  deterministic  dynamical  system 
subject  to  relatively  low  levels  of  noise.  By  contrast  our  application  involves  very  high 
noise  levels,  thus  the  standard  algorithms  will  fail.  The  approach  used  here,  therefore,  is 
to  vary  both  L  and  n  as  parameters  associated  with  the  proposed  detector. 

The  unthresholded  recurrence  matrix  is  the  NxN  matrix  \dt  ]  where 

du=\Xi-Xj\ 

is  the  distance  between  the  vectors  Xi  and  .  In  this  paper,  this  distance  will  always  be 

the  Euclidean  distance;  that  is,  the  square  root  of  the  sums  of  the  squares  of  the 
corresponding  component  differences.  Other  definitions  of  distance  have  been  used  in 
other  applications.  This  unthresholded  matrix  will  be  the  concept  of  interest  in  this  paper. 

Note  that  in  the  literature,  a  standard  recurrence  plot  compares  the  distances  to  a 
pre-detennined  threshold  value  £ ,  and  a  black  dot  is  placed  at  location  (i,j)  if  dt  .  <  s  or  a 

white  dot  if  dt  .  >  £  .  So  for  any  thresholds  >  0  ,  the  main  diagonal  of  the  recurrence  plot 

will  always  be  black.  By  convention,  the  (1,1)  location  is  the  lower  left  entry  of  the 
recurrence  plot  and  the  (N,  N)  location  is  in  the  upper  right  corner.  Mathematically,  the 
entries  of  the  recurrence  plot  are  given  by 
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where  0  is  the  Heaviside  step  function 


so  a  value  of  1  corresponds  to  a  black  dot,  and  0  to  a  white  dot. 

Consider  the  following  simple  example.  Suppose  the  original  data  vector  is 
x  =  (0,4, 2, 3, 7, 9}  and  letn=3  andX=l.  Then  iV=6-(2xl)=4  and 

Xx  = {0,4,2} 

X2=  {4,2,3} 

X3=  {2,3,7} 

XA  ={3,7,9} 


The  unthresholded  recurrence  plot  is  then  given  by 
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Note  that  in  this  example  the  time  delay  vectors  are  all  distinct,  but  there  are  clearly 
redundancies  in  the  distance  values. 

Standard  recurrence  plots  have  been  useful  in  many  applications  because  they 
provide  evidence  of  the  underlying  structure  of  a  system.  For  example,  compare  the 
recurrence  plots  for  a  deterministic  periodic  function,  a  chaotic  system,  and  random  noise 
in  Fig.  (1). 


Gaussian  Noise 


50  100  150  200 


Sine  Wave 


50  100  150  200 


Fig.  1.  Example  unthresholded  recurrence  plots  for  Gaussian  noise,  a  sine-wave, 
and  the  output  of  the  chaotic  Lorenz  system. 


Lorenz  output 


50  100  150  200 


5 


Probability  Distributions 


In  order  to  formulate  the  signal  detection  strategies  to  be  assessed  in  this  paper,  some 
basic  probability  distributions  need  to  be  calculated.  In  these  derivations,  xl,x2,...,xn  are 
defined  to  be  independent  Gaussian  random  variables,  each  with  zero  mean  and 
variance  cr2 .  The  Xj  used  here  are  arbitrary  random  variables  and  are  not  related  to  the  X, 
defined  already  as  the  delay  vectors.  Additionally,  the  index  n  is  not  necessarily  related 
to  the  embedding  dimension  although  later  it  will  be  interpreted  as  such.  For  the 
purposes  of  our  analysis,  we  require  the  probability  distribution  of  the  length  and  the 
length  squared  of  the  vector  (x1;  x2, . . .,  xn ) .  Each  x,.  has  probability  density  function 


PM 


-xf  Her 


i  =  1,2,.. 


n . 


We  first  derive  the  probability  density  function  for  r  =  Jxf  +  x2  H —  +  x2  .  This  can  be 

achieved  readily  if  we  can  obtain  a  one-to-one  transformation  on  Rn  in  which  r  is  one  of 
the  transformed  variables  and  r  is  also  separable  from  the  other  variables  in  the  joint 
probability  density  function  of  the  transformed  variables.  A  transfonnation  of  variables 
(Xj ,  x2 , . . . ,  xn )  — >  (r ,  0X , . . . ,  0n  A  )  satisfying  these  requirements  is  given  by 


Xj  =  r  cos  0X 
x2  =  r  sin  6X  cos  02 
x3  =  r  sin  6X  sin  d2  cos  02 


x„_i  =  r  sin  0X  sin  02...  sin  dn_2  cos  0n_x 
xn  =  r  sin  6X  sin  d2...  sin  0n2  sin  0n_  x 


where  0  <  r  <  oo  and  0  <  6t  <  n  .  Then 


p(r,0x,...0n_x)  =  JnY[p(xt) 

i= 1 

where  J n  is  the  Jacobian  of  the  transformation.  Inserting  the  formula  for  ,J n  computed  in 
Appendix  A  yields 


p(r,6x,...,  0n_x )  =  r"-'  (sin  0X  )'"2  (sin  92 )"  ’ ...  sin  Gn_2  \ 

cr 

The  terms  containing  0x,...,0n_x  may  be  integrated  out  to  get p(r),  but,  since  they  are  all 
independent  of  r,  they  must  be  a  constant.  It  is  easier  to  simply  note  that 


i 

0  rr 
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where 


p(r )  =  Cr"  'e 


r2 /2a2 


C=  ]rn-xe^lla2dr 


y1 


since  a  probability  density  function  must  integrate  to  one.  Now 

r( 


-r  1 2(j* 


dr  =  - 


1 


V  2cr2  j 


so 


C  = 


(  i  A 


cr'T 


n!  2 


2J 


and 


/?(r)  = 


1 

v2y 


_ /It— > 

cr  r 


r"_1e~r  /2ct 


0  <  r  <  oo 


(1) 


where  T()  is  the  Gamma  function.  This  is  the  probability  density  function  of  the 
Euclidean  length  of  n-dimensional  vectors  in  R"  whose  components  are  i.i.d  and  drawn 
from  a  zero  mean  Gaussian  random  process  with  variance  cr".  The  mean  of  the  above 
distribution  is  given  by 


7 


jur  =  j  rp(r)dr 

0 


and  the  variance  by 

00 

<?;  =  ^r2p[r)dr-ju2 

o 


Note  that  in  the  special  case  where  n=3,  these  formulas  reduce  to 
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V  n ) 


Next  we  derive  the  distribution  of  r1  =  x2  +  x2  ■+ —  xn .  Making  the  change  of  variables 
u=r 2  yields 


p(u)  = 


p{r(  u)) 


du 


dr 

V 

v2  j 


(n- 1)/2  -u/2az 

u  e 


_ n-r- \ 

a-  r 


^  n  ^ 


2u 


1/2 


v2  j 


ii!  2-1  -u!  2<j 


cr  r 


0  <  u  <  oo 


(2) 


This  distribution  is  readily  recognized  as  a  Gamma  distribution,  or  more  particularly,  a 
Chi-square  distribution  with  n  degrees  of  freedom  when  o  =1.  For  this  distribution,  the 
mean  is  given  by 
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In  the  special  case  where  n  =  3,  these  formulas  become 

p{u)  = — 3  1 — uU2e^',2cJ  0<w<oo 
cr  v2;r 

A,  =  3<T 

o;=6a4 

The  fonnulas  derived  above  can  be  found  in  the  literature;  for  example,  a  proof 
for  (1)  by  means  of  a  geometrical  method  may  be  found  in  Parzen11  and  a  proof  for  (2) 
by  means  of  characteristic  functions  is  in  McDonough  and  Whelan1.  In  fact,  the  case  for 
n  =  2  and  3  is  found  in  many  textbooks  as  an  example  of  the  transformation  of  variables. 
The  straightforward  proof  by  induction  given  here  is  included  because  the  authors  have 
not  seen  it  in  the  literature. 

Formulas  (1)  and  (2)  give  the  probability  densities  of  the  length  and  the  length 
squared  for  a  vector  whose  components  are  independent  samples  from  the  same  Gaussian 
distribution  with  mean  0  and  variance  a2.  However,  what  is  needed  for  this  paper  is  the 
distance  between  two  vectors.  That  is,  let  xi,X2,. . .,xn  and  yi,y2,. . .,yn  be  independent 
random  variables  chosen  from  the  same  Gaussian  distribution  with  mean  0  and  variance 
a",  and  define 


r  =  \l(xi-yif +(x2-y2)2+---+(xn-yn)2  and 


u  =  r 


However,  note  that  if  x  and  y  are  independent  Gaussian  random  variables  with  mean  0 
and  variance  a“,  then  (x-y)  is  also  Gaussian,  again  with  mean  0  but  now  with  variance 
2a2.  In  fact,  we  can  relax  the  requirement  that  x  and  y  have  zero  mean.  As  long  as  they 
have  the  same  mean,  then  the  result  is  equally  valid.  Consequently,  for  the  case  of  the 
distance  between  two  vectors,  fonnulas  (1)  and  (2)  become 
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Detection  Strategies 

Suppose  we  are  sampling  from  a  data  stream  of  values  of  the  form 

x{m )  =  ,y(m)  +  n(m)  m  =  1, 2, . . . ,  M 

where  the  s(m)  are  values  of  a  signal  and  the  n(m)  are  Gaussian  white  noise  with  zero 
mean  and  variance  a2.  The  signal  may  or  may  not  be  present,  and  the  parameters  of  the 
signal  are  unknown.  Two  approaches  will  be  evaluated  in  this  paper:  the  first  is  a 
straightforward  normality  detector  and  the  second  makes  use  of  the  recurrence  matrix  of 
the  data. 


Normality  Detector 

If  the  signal  is  absent,  then  the  measured  values  x(tn)  represent  just  Gaussian  noise  n(m). 
Hence  the  histogram  generated  from  the  x(m)  values  should  fit  the  Gaussian  density 
function  of  the  background  noise.  This  hypothesis  is  tested  by  means  of  a  %  Goodness- 
of-fit  test  based  on  the  yj  statistic 


^  _  y’  (°k  ~ekY 

k= i  ek 

where  K  is  the  number  of  bins  of  the  histogram  and  the  Ok  and  cjk  are  the  observed  counts 
and  the  expected  Gaussian  counts  in  each  bin,  respectively.  The  statistic  S  is  compared  to 
the  appropriate  threshold  value  of  the  y  ~  distribution  with  K- 1  degrees  of  freedom.  If 
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S  <  xl  then  we  declare  no  signal  present.  However,  if  S>  xl,  then  we  declare  a  signal 
present. 

In  the  calculations  to  follow,  K=15  bins  proved  to  be  suitable  for  M=100  to  200 
(the  size  of  the  signals  considered  in  this  work).  The  threshold  used  in  this  detection 
scheme  corresponded  to  a  critical  region  of  size  0.05  for  the  y  distribution. 

Initially  the  Gaussian  noise  was  generated  using  the  MATLAB  random  number 
generator.  It  was  surprising  how  often  (approximately  5%  of  the  time)  such  values  failed 
the  %2  test  for  normality.  A  typical  example  is  shown  in  Figure  2.  Note  the 


Distance  r  Distance  r 


Fig.  2.  Comparison  of  MATLAB  generated  Gaussian  deviates  (left)  and  those 
generated  from  the  “ideal”  Gaussian  (right).  The  estimated  distribution  in  the  left  plot 
fails  the  chi-squared  test  for  normality  despite  the  fact  that  the  data  was  produced  by  a 
Gaussian  number  generator. 

frequent  presence  of  spikes  near  the  mean  of  the  plot.  Such  anomalies  cannot  be 
tolerated  in  testing  our  technique  because  these  spikes  produce  false  alarms.  For 
detection  of  very  small  signals,  a  much  more  precise  representation  of  the  background 
Gaussian  noise  is  needed,  so  we  developed  a  method  for  constructing  data  sets  having  a 
nearly  ideal  normal  properties  [Appendix  C]. 


Recurrence  Plot  Detector 

In  this  case,  the  time-delay  vectors  Xx  for  i  =  1,  ...,N  are  constructed  from  the  x(m)  values 
as  described  before,  thus  incorporating  the  time  signature  of  the  original  time  series. 
Each  of  these  vectors  has  length  n,  where  n  is  the  embedding  dimension,  and  each 
component  of  X\  is  a  value  in  the  original  data  stream  which  samples  are  assumed  to  be 
independent.  The  NxN  unthresholded  recurrence  plot  has  entries 
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which  is  just  the  distance  r  between  the  vectors  as  described  above.  If  there  is  no  signal 
present  then  the  original  samples  come  from  a  Gaussian  distribution  with  zero  mean  and 
variance  a2  and  so  the  non-diagonal  entries  in  the  unthresholded  recurrence  plot  follow 
the  Gamma-like  distribution  given  in  Equation  (3).  To  avoid  duplications,  we  only  use 
the  N(N-\)/2  entries  above  the  main  diagonal.  This  technique  could  also  be  based  on  r2 
values,  and  then  Equation  (4)  would  be  used. 

The  procedure  is  now  similar  to  the  first  technique.  If  there  is  no  signal  present, 
the  histogram  of  the  djj  values  should  fit  the  Gamma-like  distribution  of  Equation  (3). 
Again  a  %“  Goodness-of-fit  test  can  be  applied  with  a  threshold  based  on  the 
distribution  with  K- 1  degrees  of  freedom  if  a  is  known  (K-2  degrees  of  freedom  if  a  is 
estimated  from  the  data) 

We  now  have  many  more  data  points  than  in  the  corresponding  Normality 
Detector  Test,  but  A=15  bins  still  proved  to  be  sufficient  for  a  valid  x  test.  Again,  if  the 
X  value  was  below  the  threshold,  we  declared  the  signal  to  be  absent;  if  above  the 
threshold,  we  declared  the  signal  present.  In  this  case,  however,  we  found  that  for  the 
types  of  signals  treated  in  this  analysis,  too  many  false  positives  occurred  when 
comparing  S  to  the  threshold  Xo  •  That  is,  the  value  of  S  was  often  rather  large  even  with 
no  signal  present,  but  much  larger  when  the  signal  was  present.  This  effect  is  probably 
due  to  the  redundancies  in  the  recurrence  matrix  data.  The  derivation  of  the  null 
distribution  assumed  independent  distances  when  in  reality  these  distances  may  contain 
redundancies.  The  result  is  that  the  size  of  the  test  (the  probability  that  the  null 
hypothesis  is  rejected)  is  not  commensurate  with  the  observed  Type-I  error.  In  this  case, 
if  the  threshold  Xo  is  set  such  that  we  expect  5%  false  alarms,  the  correlations  in  the  data 
result  in  a  percentage  of  rejections  that  can  be  much  larger  than  5%.  We  therefore 
calibrated  our  test  by  using  an  adjustment  factor  k  to  multiply  the  statistic,  S.  This 
factor  reduces  the  number  of  false  positives  to  the  correct  level  thus  allowing  for  a  more 
meaningful  test  .  Calculations  have  shown  that  small  values  n= 2,3,4  of  the  embedding 
dimension  work  best  in  this  detector,  and  for  these  values  k=0.5,0.3,0.  1  respectively. 

The  number  of  data  can  also  influence  the  choice  of  k.  We  have  found  that  larger  values 
for  M  require  a  decrease  in  k  value. 

Embedding  dimension  n  is  one  important  sensitivity  parameter;  others  are  the 
delay  time,  L  and  the  sampling  rate  of  the  original  data  stream.  Of  course  it  is  desirable 
that  our  detection  technique  will  be  effective  for  many  signal  types.  The  optimal  values 
of  the  key  sensitivity  parameters  will  generally  be  a  function  of  the  type  of  signal 
received.  Consequently,  in  practice,  a  bank  of  parameter  values  should  be  swept  through 
to  test  for  the  presence  of  an  unknown  signal. 

Both  detectors  (nonnality  and  recurrence)  are  compared  here  in  terms  of  their 
Receiver  Operating  Characteristic  (ROC)  performance.  The  ROC  curve  simply  displays 
the  probability  of  detection  Pd  versus  the  probability  of  false  alarm  Pfa  associated  with 
these  detectors  as  a  function  of  the  detection  threshold.  The  signal  of  interest  was  taken 
to  be  a  10Hz  sine  wave  with  additive  Gaussian  noise.  The  signal-to-noise  ratio  (SNR) 
was  taken  as  the  average  power  of  the  signal  divided  by  the  variance  of  the  noise.  Figure 
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(3)  shows  some  typical  results  comparing  detector  performance  as  a  function  of  signal  to 
noise  ratio. 
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Fig.  3.  ROC  curves  associated  with  detecting  a  10  Hz.  sine -wave  in  Gaussian  noise 
for  varying  SNR  levels.  Sampling  parameters  were  M=100  and  At=0.01s. 

Embedding  parameters  were  n= 3,  L= 5.  Threshold  factor  was  k=0.3. 

The  embedding  parameters  used  in  this  example  were  M=  100,  n= 3,  L=  5.  The  signal  was 
sampled  at  intervals  of  At=0.01s.  For  the  recurrence-based  detector  the  threshold 
adjustment  factor  was  set  to  k=0.3.  The  results  of  Figure  (3)  are  typical  of  those  found 
using  other  embedding  parameters.  For  low  SNR  values,  the  normality  detector  easily 
outperforms  the  recurrence  detector.  However,  for  intermediate  SNR  levels  there 
typically  exists  a  range  where  the  recurrence-based  detector  shows  superior  ROC 
performance.  Changing  the  embedding  parameters  does  not  seem  to  affect  the  result. 

For  example,  using  a  dimension  of  n= 2  and  L=  8  and  k=0.5  results  in  the  ROC  curves  of 
Fig  (4). 


Again,  for  low  SNR  values  (<10dB)  the  chi-square  test  for  non-nonnality  is  the 
best  performer  while  for  higher  SNR  values  the  recurrence-based  detector  shows  the  best 
ROC  performance.  The  influence  of  the  number  of  points  and  sampling  rate  were  also 
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Fig.  4.  ROC  curves  associated  with  detecting  a  10  Hz.  sine-wave  in  Gaussian  noise 
for  varying  SNR  levels.  Sampling  parameters  were  M  =  100  and  At  =  0.01s. 

Embedding  parameters  were  n  =  2,  L  =  8.  Threshold  factor  was  k=  0.5. 

explored.  Fig  (5)  shows  the  results  of  increasing  the  number  of  data  to  M  =200  points 
and  decreasing  the  sampling  interval  to  At  =  0.005s  (thus  the  total  duration  of  the  signal 
is  kept  constant  from  the  previous  examples).  The  embedding  parameters  used  were 
n  =  2,  L  =  8  as  in  the  previous  example.  For  this  number  of  data,  an  appropriate  threshold 
value  was  determined  to  be  k  =  0.3. 

As  before  there  exists  a  range  of  SNR  values  below  which  the  normality  detector 
yields  the  more  powerful  result.  Above  -12dB  SNR,  however,  the  recurrence-based 
detector  again  shows  superior  performance  until  both  methods  converge  to  unity  for  the 
probability  of  detection  for  any  probability  of  false  alarm. 

The  results  shown  above  hold  for  a  variety  of  both  embedding  and  sampling 
parameters.  For  n> 3,  however,  the  redundancies  in  the  recurrence  plot  become  much 
larger  so  that  the  threshold  adjustment  factor  must  be  set  to  values  k<0.1.  It  is  therefore 
apparent  that  there  does  exist  a  range  of  SNR  values  for  which  the  recurrence-based 
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Fig.  5.  ROC  curves  associated  with  detecting  a  10  Hz.  sine -wave  in  Gaussian  noise 
for  varying  SNR  levels.  Sampling  parameters  were  M= 200  and  At=0.005s. 
Embedding  parameters  were  n= 2,  L= 8.  Threshold  factor  was  k=0.3. 


detection  schemes  have  some  merit  over  the  more  general  normality  detector  in  detecting 
the  presence  of  a  sine -wave  in  Gaussian  noise.  It  should  be  stressed,  however,  that  the 
value  of  these  detectors  is  that  they  do  not  assume  a  priori  knowledge  of  the  incoming 
signal.  If  it  was  known  that  a  sine-wave  was  the  signal  of  interest  an  optimal  detector  can 
be  derived,  depending  on  how  much  is  known  about  the  sine-wave  (frequency,  phase, 
etc.). 

As  another  test  of  the  recurrence-based  detector,  the  detection  of  a  square  wave  in 
additive  Gaussian  noise  was  considered.  The  embedding  parameters  n  =  2,  and  L  =  8 
were  used  along  with  the  sampling  parameters  of  M  =  200,  At  =  0.01.  Figure  (6)  shows 
these  results. 

Even  for  a  very  different  type  of  signal,  the  same  trend  in  ROC  curves  persists  as  a 
function  of  SNR. 
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Fig.  6.  ROC  curves  associated  with  detecting  a  1 0  Hz.  square-wave  in  Gaussian 
noise  for  varying  SNR  levels.  Sampling  parameters  were  M  =  100  and  At  =  0.01s. 
Embedding  parameters  were  n  =  2,  L  =  8.  Threshold  factor  was  k  =  0.3. 


Summary 

This  work  has  focused  on  the  detection  of  signals  in  noise  using  a  detector  based  on  the 
signal’s  recurrence  matrix.  Under  the  null  hypothesis  of  no  signal,  the  probability 
distribution  for  the  entries  of  the  recurrence  matrix  was  derived  and  found  to  be  a 
Gamma-like  distribution.  Using  this  distribution,  a  chi-squared  test  was  performed  in 
order  to  assess  whether  or  not  some  underlying  signal  was  present.  Due  to  redundancies 
in  the  recurrence  plot,  this  test  had  to  be  slightly  modified  in  order  to  produce  a 
meaningful  Type-I  error.  Receiver  Operating  Characteristic  (ROC)  curves  were  then 
used  to  assess  the  performance  of  this  detector  as  a  function  of  embedding  parameters, 
signal  sampling  parameters,  and  signal-to-noise  ratio  (SNR).  Comparisons  were  drawn 
between  the  recurrence-based  detector  and  a  normality  detector  for  demonstrating  non¬ 
normality  of  the  incoming  signal.  Results  indicated  that  there  is  a  range  of  SNR  values 
for  which  the  recurrence-based  detector  offers  superior  performance  (in  the  sense  of 
lower  Type-II  error  for  a  given  Type-I  error)  over  the  nonnality  detector.  This  behavior 
was  observed  for  both  sine  wave  and  square  wave  signals  in  additive  Gaussian  noise. 
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Appendix  A. 


Calculation  of  the  Jacobian  of  the  Transformation  of  Variables. 

For  the  transformation  of  variables  (jCj ,  x2 , . . . ,  xn )  — >  [r,  6X , . . . ,  0n  , ) 

xx  =  r  cos  6X 
x2  =  r  sin  0X  cos  d2 
x3  =  r  sin  6X  sin  02  cos  d2 

xn_ i  =  r  sin  9X  sin  02 . . .  sin  0n2  cos  6n  , 
xn  =  r sin 6{  sin d2... sin 0n_2  sin 6n  , 

where  0  <  r  <  oo  and  0  <  6t  <  n  ,  the  Jacobian  is  given  by 

o 
o 
0 

in  0n_ 2  sin  0n 
.sin 0n_2  cos 


we  shall  prove  that 

J n  =  r H_1  (sin  9X )"  ‘  (sin 02 )"  \ .  sin  6n 

Consider  Jn  for  a  few  special  cases: 

For  n  =  2: 


COS0I 

sin  6{  cos  d2 
sin  9,  sin  ft  cos  9, 


-r  sin  9\ 
r cos  ft  cos  ft 
r  cos  ft  sin  ft  cos  ft 


-r  sin  ft  sin  ft 
r  sin  ft  cos  ft  cos  ft 


sin  ft  sin  ft . . .  sin  ftf_,  cos  ft_j  r  cos  ft  sin  ft . . .  sin  ft,_,  cos  ft,_,  r  sin  ft  cos  ft . . .  sin  ft1-2  cos  ft,. 
sinft...sin0  .  rcosft  sinft  ...sini?  .  rsinft  cosft  ...sini?  , 


0 

0 

-r  sin  ft  sin  ft,  sin  ft. 


— r  sin  ft  ...  si 
r  sin  ft  sin  ft, . . 


J2  = 


cos  6X  -r  sin  6X 
sin  6X  r  cos  6X 


=  r 


For  n  =  3:  by  expanding  by  cofactors  of  the  first  row 
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cos  9X  — r  sin  9X  0 

J3  =  sin  9X  cos  92  r  cos  9X  cos  92  -r  sin  6X  sin  92 

sin  9X  sin  92  r  cos  9X  sin  92  r  sin  9X  cos  92 

=  cos  9xr2  sin  9X  cos  9X  +  r  sin  9xr  sin2  9X  =  r2  sin  9X 

For  n  =  4:  by  expanding  by  cofactors  of  the  first  row 

cos  9X  -r  sin  ^  0  0 

sin  9X  cos  92  r  cos  9X  cos  92  -r  sin  9X  sin  92  0 

sin  9X  sin  92  cos  93  r  cos  9X  sin  92  cos  93  r  sin  9X  cos  92  cos  93  -r  sin  9X  sin  92  sin  93 

sin  9X  sin  92  sin  93  r  cos  9X  sin  92  sin  93  r  sin  9X  cos  92  sin  93  r  sin  9X  sin  92  cos  93 

r  cos  9X  cos  92  -r  sin  9X  sin  92  0 

=  cos  9X  r  cos  9X  sin  92  cos  93  r  sin  9X  cos  92  cos  93  -r  sin  9X  sin  92  sin  93 

r  cos  9X  sin  92  sin  93  r  sin  9X  cos  92  sin  93  r  sin  9X  sin  92  cos  93 

sin  9X  cos  92  -r  sin  9X  sin  92  0 

+r  sin  9X  sin  9X  sin  92  cos  93  r  sin  9X  cos  92  cos  93  -r  sin  9X  sin  92  sin  93 
sin  9X  sin  92  sin  93  r  sin  9X  cos  92  sin  93  r  sin  9X  sin  92  cos  93 

cos  92  -r  sin  92  0 

=  cos  9X  ( r  cos 9X ) (sin  9X ) (sin  9X )  sin 92  cos  93  r  cos 92  cos 93  -r  sin  92  sin 93 

sin  92  sin  93  r  cos  92  sin  93  r  sin  92  cos  93 
cos  92  -r  sin  92  0 

+r  sin  9X  (sin  9X )  (sin  9X )  (sin  9X )  sin  92  cos  93  r  cos  92  cos  93  -r  sin  92  sin  93 

sin  92  sin  93  r  cos  92  sin  93  r  sin  92  cos  93 

r  sin'  9X  cos'  9X  +  r  sin'  9X  sin'  9X  J  J3 

=  r  sin2  9xr2  sin  92  =  r  sin2  9X  sin  92 

In  general  we  will  demonstrate  that 

Jn  {r,9v92,...,  9n  , )  =  r'"'  (sin  9X  )'"2  (sin  92 )"  ’ ...  sin  6>„_2 

The  proof  is  by  induction  and  again  proceeds  by  expanding  by  cofactors  of  the  first  row. 
We  have  shown  it  above  for  n  =  2,3,4.  Assume  true  for  n  =  (K-\ ).  Then 
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JK{r,0x,...,0K_x)  = 


cos  ft 

-r  sin  ft 

0 

0 

0 

sin  ft  cos  ft 

7'  cos  ft  cos  ft 

— r  sin  ft  sin  ft 

0 

0 

sin  ft  sin  ft  cos  ft 

7-  cos  ft  sin  ft  cos  ft 

r  sin  ft  cos  ft  cos  ft 

-r  sin  ft  sin  ft  sin  ft  . 

0 

sin  ft  sin  ft ...  sin  ft_2  cos  ft 
sin  ft  . . .  sin  0K  X 

7-  cos  ft  sin  ft .. .  sin  ft,  ,  cos  ft  , 
r  cos  ft  sin  ft ...  sin  ft_, 

r  sin  ft  cos  ft  . . .  sin  ft  ,  cos  ft_, 
r  sin  ft  cos  ft ...  sin  ft_, 

-7- sin  ft  ...sin  ft_2  sin  ft  , 

. .  7-  sin  ft  sin  ft  ...  sin  ft_,  cos  ft_, 

r  cos  ft  cos  ft  -r  sin  ft  sin  ft 

r  cos  ft  sin  ft  cos  ft  r  sin  ft  cos  02  cos  ft 

=  cos  ft  :  : 

r  cos  ft  sin  d2 ...  sin  0K_2  cos  0K_X  r  sin  ft  cos  02 ...  sin  ft  ,  cos  ft_, 
r  cos  ft  sin  02...  sin  0K_2  sin  ft_,  r  sin  ft  cos  02 ...  sin  0K_2  sin  ft._2 

sin  ft  cos  02  -r  sin  ft  sin  02 

sin  ft  sin  02  cos  ft  r  sin  ft  cos  02  cos  03 

+7'  sin  ft  :  : 

sin  ft . . .  sin  0K_2  cos0K_x  r  sin  03  cos  02 . . .  sin  0K_:  cos  0K 
sin  9S . . .  sin  0K_2  sin  0K  l  r  sin  9X  cos  02 . . .  sin  0K_2  sin  0K 


cosft  -r  sin  02  ■■■  0 

sin  02  cos  03  v  cos  02  cos  03  ...  0 

=  cos6,l  (7-cos0l)(sin0l)A  ‘ 

sin  02 ...  sin  0K  2  cos  0K_,  r  cos  02 . . .  sin  0K_2  cos0K_x  ...  -r  sin  02 ...  sin  0K_2  sin  0K_ 

sin  02 ...  sin  0K_2  sin  0K_X  r  cos  02 . . .  sin  0K_2  sin  0K_X  ...  r  sin  02 ...  sin  0K_2  cos  0K_X 

cos  ft  -r  sin  ft  ...  0 

sin  ft  cos  03  7'  cos  ft  cos  03  ...  0 

+7-sinft  (sinft)^  ’  :  :  : 

sin  ft  ...  sin  0K  2  cos  0K_X  r  cos  ft  . . .  sin  0K_2  cos  ft._,  ...  -r  sin  ft ...  sin  ft.  2  sin  0K_ 

sin  ft . . .  sin  ft._,  sin  0K_X  r  cos  ft  ...  sin  0K_2  sin  ft.  ,  ...  r  sin  ft . . .  sin  ft  2  cos  ft_, 

=  1 7'  cos2  ft  ( sin  ft  )A  "  +7-  sin2  ft  (sin  ft  )A  ~^JK_x(r,02,...,OK_x ) 

=  7-(sinft)A  2 (sin ft )*  3(sinft)A  4 ...sinft._,J 

=  ?''f_l(sinft)A  _(sinft)A  3...sinft_2 


by  the  induction  hypothesis,  which  is  the  desired  formula  for  n  =  K. 


o 

o 

-7-  sin  ft  ...  sin  ft-_2  sin  0K_X 
r  sin  ft  ...  sin  ft  2  cos  ft_. 


0 

0 

-r  sin  ft  . . .  sin  ft_,  sin  ft_, 
7-  sin  ft  . . .  sin  ft.  ,  cos  ft_. 
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Appendix  B. 

Summary  of  Probability  Density  Functions 


The  following  list  provides,  for  ready  reference,  the  probability  density  functions 
required  in  the  body  of  the  report.  The  variables,  x;  xi,X2,. .  .,x„;  yi,V2,. .  .yn  are 
independent  Gaussian  random  variables,  each  with  zero  mean  and  variance  a2 


P(x)  =  —pr= 

cry  in 
p(w  =  x2)  =  — l-f 


-X2 /2a 


-00  <  x  <  00 

-1/2  -w/2a2 


<J 


w  e 


0  <  w<  oo 


P 


2,2,  ,2 

Xj  +x2  +  ...  +  xn 


1 

v2y 


«T-' 

<j  r 


v^y 


k-1  -r2 1 2<j2 

r  e 


0  <  r  <  oo 


u  —  X2  +  x2  + . . .  +  x,t )  — 


0= 


I 

v2y 


<rT 


v^y 


un/2-le-ul2„  0  <  M  <  00 


=  V(^l  _-V!)2  +(^2  -  V2)2  +  -  V„)2  )  = 


1 


4 
cr'T 


V^y  n-1  -r2  /4a2 

—  f  n  r  e 

I  T/i  \ 


v^y 


/>(w  =(xj -y,)2  +(x2 -y2)2  +  ...  +  (x„ -y„)2)  =  ■ 


II 

V4y 


crT 


v^y 


u„,2-le-u/4* 


0  <  r  <  oo 


0  <  u  <  oo 
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Appendix  C. 

Construction  of  a  data  set  having  “Ideal”  Gaussian  properties 

As  mentioned  in  the  text,  computer-generated  histograms  of  Gaussian  distributions  often 
(approximately  5%  of  the  time)  fail  a  chi-square  Goodness-of-Fit  test  for  normality. 

Such  distributions  are  unacceptable  representations  of  the  background  noise  for  the 
purposes  of  this  analysis.  Two  such  examples  are  shown  in  Fig.  AB1  below. 


Fig.  Cl  Two  example  plots  showing  the  observed  and  expected  counts  associated 
with  1 0000  Gaussian  deviates  generated  via  the  MATLAB  random  number 
generator.  The  number  of  bins  was  set  to  K=50.  In  both  examples  the  data  failed  the 

chi-square  Goodness-of-Fit  test  for  threshold  value  %0  =59.3  (corresponding  to  a 
Type-I  error  of  5%). 

To  test  the  detection  strategies  in  this  paper  a  nearly  perfect  Gaussian 
representation  of  the  noise  is  needed.  In  this  Appendix,  a  method  is  shown  to 
mathematically  construct  an  ideal  approximation  to  a  Gaussian  distribution  of  white 
noise;  that  is,  a  finite  sequence  of  numbers  with  the  following  properties: 

1)  Gaussian-shaped  histogram  of  amplitudes 

2)  Delta-function  autocorrelation  function  and  therefore  a  flat  power  spectral 
density  function. 


Consider  the  particular  normal  distribution  A(0,1)  with  zero  mean  and  unit 
variance.  The  probability  that  such  a  random  variable  x  occurs  in  the  range  a<x<b  is 
given  by 


P[a<x<b ) 


b 

J  exp(-x2  /  2 )dx 

a 


erf[b!  y/l^j-erf^a/  j 


If  x(t)  is  a  time  series  whose  values  are  independently  sampled  from  the  above 

1  -J 

distribution,  then  its  autocorrelation  function  is  given  by 
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R.xX  (t,t  +  r)  =  E  [ x(t)x(t  +  r)] 


xeJV(0,l) 


>S{r) 


and  its  two-sided  power  spectral  density 


oo 

Sja^=2n  \R^e~iWTdr' 


xsN{  0,1) 


2  n 


That  is,  the  power  spectral  density  is  flat,  independent  of  frequency. 

The  process  for  constructing  a  data  vector  having  the  desired  properties  is  as  follows. 

1 .  Choose  the  number  of  points  M  in  the  data  vector,  the  number  of  bins  K,  and  the 
maximum  and  minimum  values  of  the  histogram,  xmax  and  xmin,  respectively. 


2.  Then  the  width  of  each  bin  is  A  =  (xmax- xmin)/  K , 

bin  edges  occur  at  xe(k)  =  x  min-l- (k  - 1)  A  for  k  =  1,2,- --,[K  + 1)  (note  that  this 

includes  the  lower  edge  of  the  lowest  bin  and  the  upper  edge  of  the  upper  bin),  and 
bin  centers  occur  at  xc(y)  =  xmin+(y  -1)A/  2  for  j  =  1,2,  ■  •  ■ ,  K  . 

Hence  there  are  K  bin  centers  but  (K+l)  bin  edges. 

3.  The  exact  (non- integer)  number  of  entries  in  each  bin  is  given  by 


v(j)  =  (M  /  2) 


erf  [xe(  j  + !)/ V2  j  -  erf  ^ xe ( j ) / V2  j 


j  =  \,2,-,K 


We  calculate  the  integer  number  of  entries  in  each  bin  vr  ( j)  using  Matlab  by 
rounding  to  the  nearest  integer,  vr(j)  =  round (v(y)) .  Hence,  the  “error”  in  the  count  in 
each  bin  is  dv(  j)  =  v(j)  -  vr(j). 

4.  Once  the  integer  number  of  points  in  each  bin,  vr  ( j) ,  is  known  we  construct  an 
actual  set  S j  of  data  values  for  each  bin  by  choosing  the  appropriate  number  of  points 
from  a  uniform  distribution  on  the  interval  of  values  using  the  rule: 

sj  =  {xeU)  +  A(ranJ(vl.(y),l))}  . 

Here,  the  Matlab  function  rand(vr(  j),  1) returns  a  set  of  vr(j)  values  uniformly 
distributed  on  the  interval  (0,1). 

5.  Append  together  all  K  number  sets  SJ  to  form  an  M-dimensional  data  vector  x  . 

This  data  vector  has  the  required  property  of  Gaussian  distribution  of  amplitudes  but, 
due  to  the  way  the  numbers  were  obtained,  the  ordering  of  the  values  contains 
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unwanted  structure  -  structure  that  can  be  eliminated  by  properly  “shuffling”  the 
vector  components.  This  is  accomplished  in  Matlab  by  first  generating  a  uniformly- 
distributed  random  vector  f  of  the  same  length  as  x  .  The  elements  of  r  are  then 
sorted  in  ascending  (or  descending)  order.  Since  the  elements  of  r  are  randomly 
distributed,  the  indices  of  the  sorted  values  should  be  randomly  distributed  and  we 
use  this  property  to  shuffle  the  ordering  of  the  values  in  x' .  The  values  in  x'  are  re¬ 
arranged  using  the  indexing  obtained  from  sorting  the  values  inr  ,  resulting  in  the 
final  data  vector  x  . 


This  final  data  vector  now  has  all  the  required  properties:  1)  Gaussian  distribution 
of  amplitudes,  2)  flat  power  spectral  density,  and  3)  delta-function-like  autocorrelation 
function. 

It  should  be  noted  that  the  approach  described  above  generally  yields  a  histogram 
that  is  nearly,  but  not  perfectly,  Gaussian  and  that  generally  contains  nearly,  but  not 
exactly,  the  number  of  points  M  specified  at  the  beginning  of  the  process.  In  some  cases, 
the  agreement  can  indeed  be  perfect  but  this  is  only  an  accident.  The  reason  is  simple. 
Although  the  fraction  of  the  total  number  of  points  that  should  reside  in  any  bin  of  the 
histogram  can  be  calculated  to  any  degree  of  precision,  there  is  no  guarantee  that  this 
fraction  of  the  total  number  of  points  is  an  integer.  Hence,  the  best  we  can  do  is  to  round 
the  required  number  of  points  to  the  nearest  integer.  The  resulting  errors  are  surprisingly 
small  as  shown  below. 

Figure  C.2  shows  the  histogram  of  actual  (non-integer)  values,  the  histogram  of 
integer  values,  and  the  error  in  each  of  32  bins  for  M=  1024  data  points.  In  this  case  the 
actual  size  of  the  data  vector  was  1026  . 


Bins  of  x-values  Bins  of  x-values 

(a)  (b) 

Fig.  C2.  Comparison  of  histograms  for  (a)  calculated  (non-integer)  values,  and  (b) 
values  rounded  to  nearest  integer. 
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These  two  histograms  are  nearly  identical  with  differences  in  the  counts  in  each  bin  due 
entirely  to  the  rounding  process.  The  differences,  plotted  below  in  Fig.  C3,  are  no  longer 
than  approximately  0.5  counts. 


g 

b 


1 


Bins  of  x-values 


Fig.  C3.  Discrepancy  in  bin  occupancy  between  actual  values  and  rounded  integer 
values. 


The  sets  of  values  for  each  bin  are  shown  in  Fig.  C4(a)  while  the  shuffled  version  of  the 
same  set  of  values  is  shown  in  Fig.C4(b)  indicating  at  least  qualitatively  the  effectiveness 
of  the  shuffling  procedure. 
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Finally,  the  power  spectrum  and  autocorrelation  are  shown  in  Figs.  C5  and  C6, 
respectively. 


Fig.  C5.  Power  spectrum  of  the  shuffled  data  vector  in  Fig.  C4(b). 


2000 


Fig.  C6.  Autocorrelation  of  the  shuffled  data  vector  in  Fig.  C4(b). 
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